AITopics | statistical testing

Collaborating Authors

statistical testing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset

Rudra, Manjari, Magleby, Daniel, Sikdar, Sujoy

arXiv.org Artificial IntelligenceSep-29-2025

Questions in political interviews and hearings serve strategic purposes beyond information gathering including advancing partisan narratives and shaping public perceptions. However, these strategic aspects remain understudied due to the lack of large-scale datasets for studying such discourse. Congressional hearings provide an especially rich and tractable site for studying political questioning: Interactions are structured by formal rules, witnesses are obliged to respond, and members with different political affiliations are guaranteed opportunities to ask questions, enabling comparisons of behaviors across the political spectrum. We develop a pipeline to extract question-answer pairs from unstructured hearing transcripts and construct a novel dataset of committee hearings from the 108th--117th Congress. Our analysis reveals systematic differences in questioning strategies across parties, by showing the party affiliation of questioners can be predicted from their questions alone. Our dataset and methods not only advance the study of congressional politics, but also provide a general framework for analyzing question-answering across interview-like settings.

large language model, machine learning, utterance, (21 more...)

arXiv.org Artificial Intelligence

2509.21548

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

CDI: Copyrighted Data Identification in Diffusion Models

Dubiński, Jan, Kowalczuk, Antoni, Boenisch, Franziska, Dziedzic, Adam

arXiv.org Artificial IntelligenceNov-24-2024

Diffusion Models (DMs) benefit from large and diverse datasets for their training. Since this data is often scraped from the Internet without permission from the data owners, this raises concerns about copyright and intellectual property protections. While (illicit) use of data is easily detected for training samples perfectly re-created by a DM at inference time, it is much harder for data owners to verify if their data was used for training when the outputs from the suspect DM are not close replicas. Conceptually, membership inference attacks (MIAs), which detect if a given data point was used during training, present themselves as a suitable tool to address this challenge. However, we demonstrate that existing MIAs are not strong enough to reliably determine the membership of individual images in large, state-of-the-art DMs. To overcome this limitation, we propose CDI, a framework for data owners to identify whether their dataset was used to train a given DM. CDI relies on dataset inference techniques, i.e., instead of using the membership signal from a single data point, CDI leverages the fact that most data owners, such as providers of stock photography, visual media companies, or even individual artists, own datasets with multiple publicly exposed data points which might all be included in the training of a given DM. By selectively aggregating signals from existing MIAs and using new handcrafted methods to extract features for these datasets, feeding them to a scoring model, and applying rigorous statistical testing, CDI allows data owners with as little as 70 data points to identify with a confidence of more than 99% whether their data was used to train a given DM. Thereby, CDI represents a valuable tool for data owners to claim illegitimate use of their copyrighted data.

cdi, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.12858

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Orange County > Anaheim (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Grounding Representation Similarity Through Statistical Testing

Neural Information Processing SystemsOct-9-2024, 11:55:57 GMT

To understand neural network behavior, recent works quantitatively compare different networks' learned representations using canonical correlation analysis (CCA), centered kernel alignment (CKA), and other dissimilarity measures. Unfortunately, these widely used measures often disagree on fundamental observations, such as whether deep networks differing only in random initialization learn similar representations. These disagreements raise the question: which, if any, of these dissimilarity measures should we believe? We provide a framework to ground this question through a concrete test: measures should have \emph{sensitivity} to changes that affect functional behavior, and \emph{specificity} against changes that do not. We quantify this through a variety of functional behaviors including probing accuracy and robustness to distribution shift, and examine changes such as varying random initialization and deleting principal components.

artificial intelligence, grounding representation similarity, machine learning, (4 more...)

Neural Information Processing Systems

Country: Europe > France (0.10)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

The Perturbed Variation

Neural Information Processing SystemsMar-14-2024, 03:11:47 GMT

We introduce a new discrepancy score between two distributions that gives an indication on their similarity. While much research has been done to determine if two samples come from exactly the same distribution, much less research considered the problem of determining if two finite samples come from similar distributions. The new score gives an intuitive interpretation of similarity; it optimally perturbs the distributions so that they best fit each other. The score is defined between distributions, and can be efficiently estimated from samples. We provide convergence bounds of the estimated score, and develop hypothesis testing procedures that test if two data sets come from similar distributions. The statistical power of this procedures is presented in simulations. We also compare the score's capacity to detect similarity with that of other known measures on real data.

dimension, perturbation, similarity, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MMD-B-Fair: Learning Fair Representations with Statistical Testing

Deka, Namrata, Sutherland, Danica J.

arXiv.org Artificial IntelligenceApr-25-2023

We introduce a method, MMD-B-Fair, to learn fair representations of data via kernel two-sample testing. We find neural features of our data where a maximum mean discrepancy (MMD) test cannot distinguish between representations of different sensitive groups, while preserving information about the target attributes. Minimizing the power of an MMD test is more difficult than maximizing it (as done in previous work), because the test threshold's complex behavior cannot be simply ignored. Our method exploits the simple asymptotics of block testing schemes to efficiently find fair representations without requiring complex adversarial optimization or generative modelling schemes widely used by existing work on fair representation learning. We evaluate our approach on various datasets, showing its ability to ``hide'' information about sensitive attributes, and its effectiveness in downstream transfer tasks.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2211.07907

Country:

North America > Canada > British Columbia (0.04)
North America > Canada > Quebec (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

What's the Difference Between a Metric and a Loss Function?

#artificialintelligenceJun-18-2022, 01:20:32 GMT

Have you been using your loss function for evaluating your machine learning system's performance? That's a mistake, but don't worry, you're not alone. It's a widespread misunderstanding that may have something to do with software defaults, college course format, and decision-maker absenteeism in AI. In this article, I'll explain why you need two separate model scoring functions for evaluation and optimization… and possibly a third one for statistical testing. Throughout data science, you'll see scoring functions (like the MSE, for example) being used for three main purposes: These three are subtly -- but importantly -- different from one another, so let's take a deeper look at what makes a function "good" for each purpose.

loss function, optimization, performance evaluation, (15 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interaction-Aware Sensitivity Analysis for Aerodynamic Optimization Results using Information Theory

Wollstadt, Patricia, Schmitt, Sebastian

arXiv.org Artificial IntelligenceDec-10-2021

An important issue during an engineering design process is to develop an understanding which design parameters have the most influence on the performance. Especially in the context of optimization approaches this knowledge is crucial in order to realize an efficient design process and achieve high-performing results. Information theory provides powerful tools to investigate these relationships because measures are model-free and thus also capture non-linear relationships, while requiring only minimal assumptions on the input data. We therefore propose to use recently introduced information-theoretic methods and estimation algorithms to find the most influential input parameters in optimization results. The proposed methods are in particular able to account for interactions between parameters, which are often neglected but may lead to redundant or synergistic contributions of multiple parameters. We demonstrate the application of these methods on optimization data from aerospace engineering, where we first identify the most relevant optimization parameters using a recently introduced information-theoretic feature-selection algorithm that accounts for interactions between parameters. Second, we use the novel partial information decomposition (PID) framework that allows to quantify redundant and synergistic contributions between selected parameters with respect to the optimization outcome to identify parameter interactions. We thus demonstrate the power of novel information-theoretic approaches in identifying relevant parameters in optimization runs and highlight how these methods avoid the selection of redundant parameters, while detecting interactions that result in synergistic contributions of multiple parameters.

evolutionary algorithm, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2112.05609

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre: Research Report (0.82)

Industry: Aerospace & Defense (0.86)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

Better than Average: Paired Evaluation of NLP Systems

Peyrard, Maxime, Zhao, Wei, Eger, Steffen, West, Robert

arXiv.org Artificial IntelligenceOct-20-2021

Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances. In this work, we question the use of averages for aggregating evaluation scores into a final number used to decide which system is best, since the average, as well as alternatives such as the median, ignores the pairing arising from the fact that systems are evaluated on the same test instances. We illustrate the importance of taking the instance-level pairing of evaluation scores into account and demonstrate, both theoretically and empirically, the advantages of aggregation methods based on pairwise comparisons, such as the Bradley-Terry (BT) model, a mechanism based on the estimated probability that a given system scores better than another on the test set. By re-evaluating 296 real NLP evaluation setups across four tasks and 18 evaluation metrics, we show that the choice of aggregation mechanism matters and yields different conclusions as to which systems are state of the art in about 30% of the setups. To facilitate the adoption of pairwise evaluation, we release a practical tool for performing the full analysis of evaluation scores with the mean, median, BT, and two variants of BT (Elo and TrueSkill), alongside functionality for appropriate statistical testing.

computational linguistic, median, proceedings, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2021.acl-long.179

2110.10746

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(12 more...)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Statistical Testing for Efficient Out of Distribution Detection in Deep Neural Networks

Haroush, Matan, Frostig, Tzivel, Heller, Ruth, Soudry, Daniel

arXiv.org Machine LearningFeb-25-2021

Commonly, Deep Neural Networks (DNNs) generalize well on samples drawn from a distribution similar to that of the training set. However, DNNs' predictions are brittle and unreliable when the test samples are drawn from a dissimilar distribution. This presents a major concern for deployment in real-world applications, where such behavior may come at a great cost -- as in the case of autonomous vehicles or healthcare applications. This paper frames the Out Of Distribution (OOD) detection problem in DNN as a statistical hypothesis testing problem. Unlike previous OOD detection heuristics, our framework is guaranteed to maintain the false positive rate (detecting OOD as in-distribution) for test data. We build on this framework to suggest a novel OOD procedure based on low-order statistics. Our method achieves comparable or better than state-of-the-art results on well-accepted OOD benchmarks without retraining the network parameters -- and at a fraction of the computational cost.

distribution detection, hypothesis, statistical testing, (14 more...)

arXiv.org Machine Learning

2102.12967

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.33)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Seven Sins of Machine Learning

#artificialintelligenceJun-26-2020, 02:46:31 GMT

Machine learning is a great tool that is revolutionizing our world right now. There are lots of great applications in which machine and in particular deep learning has shown to be way superior to traditional methods. Beginning from Alex-Net for Image Classification to U-Net for Image Segmentation, we see great successes in computer vision and medical image processing. Still, I see machine learning methods fail every day. In many of these situations, people fell for one of the seven sins of machine learning.

artificial intelligence, experiment, machine learning, (16 more...)

#artificialintelligence

Genre: Research Report (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback